Learning Shallow Syntactic Dependencies from Imbalanced Datasets: A Case Study in Modern Greek and English
نویسندگان
چکیده
The present work aims to create a shallow parser for Modern Greek subject/object detection, using machine learning techniques. The parser relies on limited resources. Experiments with equivalent input and the same learning techniques were conducted for English, as well, proving that the methodology can be adjusted to deal with other languages with only minor modifications. For the first time, the class imbalance problem concerning Modern Greek syntactically annotated data is successfully addressed.
منابع مشابه
Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملLearning Subcategorization Frames from Corpora: a Case Study for Modern Greek
Certain Natural Language Processing (NLP) applications such as parsing and semantic processing require complete lexicons that provide subcategorization information for a word of interest, i.e. the necessary information about the set(s) of syntactic constituents the word must combine with, in order for its meaning to be fully expressed. Modern Greek presents high flexibility in the allowable ord...
متن کاملOnline Processing of English Wh-Dependencies by Iranian EFL Learners
To be able to reach the level of ultimate attainment in the second language, learners need to acquire not only the grammar of the L2 but also the language processing mechanisms involved in the comprehension of sentences in real time. Contrary to its importance, very little is known yet about online L2 processing. This study examines whether advanced Iranian learners of English reactivate disloc...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملLearning of relative clauses by L3 learners of English
In surveys of third language acquisition (TLA) research, mixed results demonstrate that there is no consensus among researchers regarding the advantages and/or disadvantages of bilinguality on TLA. The main concern of the present study was, thus, to probe the probable differences between Persian monolingual and Azeri-Persian bilingual learners of English regarding their...
متن کامل